Method and system for gathering information resident on global computer networks

ABSTRACT

A method and system for confidentially accessing and reporting information present on global computer networks. The present invention deterministically analyzes a set of network resources over a configurable monitoring period, thereby guaranteeing that recently published information is retrieved. The present invention includes a scalable software system that can be readily executed on a stand-alone computing system or distributed across a network of computing devices. At the end of each monitoring period, the present invention balances the traversal and searching of network resources across the computing devices in the distributed system according to the previous number of pages retrieved for each network resources, thereby more accurately balancing the system. Furthermore, in order to reduce system resource requirements, the present invention searches only those network resources that are targeted either individually or as a industry. In addition, the present invention further conserves computing resources by not searching documents or files that have already matched search criteria and have remained unchanged.

RELATED APPLICATION

This application claims priority from U.S. Provisional Application Ser.No. 60/091,348, filed Jul. 1, 1998.

TECHNICAL FIELD

This invention relates generally to the field of information management,and more particularly to a method and system for confidentially trackingand reporting information available on global computer networks.

BACKGROUND

The Internet has experienced exponential growth and the number ofinterconnected computers is quickly approaching one billion worldwide.As such, the Internet provides unprecedented access to massive volumesof information and resources. An entity resource, such as a company,organization, periodical, etc., presents information to the Internet byuploading the information to a server that is connected to one of theinterconnected networks and has a registered Internet Protocol (IP)address. Often, an entity organizes its information on the server as ahierarchy of pages composed with hypertext markup language (HTML). Alongwith general information, each page may contain links to otherinformative items including graphics, documents or even links to otherweb sites. Users can easily access an entity's information using agraphical software program referred to as a browser. Because theInternet is essentially a vast web of interconnected computers,databases, systems and networks, an entity's information is oftenreferred to as its “website”. For this reason, the Internet and itsinterconnected web sites is often referred to as the World Wide Web.Finding relevant information on the Internet, including the millions ofwebsites and the billions of individual web pages, is a difficult taskthat has been inadequately addressed.

Many companies have developed search engines in an attempt to ease thelocation and retrieval of information from the Internet. Examples ofcurrent search systems include the AltaVista™ search engine developed byDigital Equipment Corp., Lycos™, Infoseek™, Excite™ and Yahoo™. Mostconventional search systems consist of two components. First, a datagathering component, known as a webcrawler or robot, systematicallytraverses the Internet and retrieves information from various websites.Often, the webcrawler moves from website to website traversing everylink found. As the individual websites are accessed, each page ofinformation is retrieved, analyzed and stored for subsequent searchingand retrieval. After retrieving and examining each page of a website,the webcrawler moves on to another site on the Internet. While thewebcrawler is traversing various websites and retrieving the pages ofinformation, the webcrawler indexes the information presented by eachpage and stores a link to each page and the corresponding indexinformation in a repository such as a database.

The second component of conventional search systems is the searchengine. The search engine provides an interface for selecting the linksstored in the repository in order to identify web pages with desiredcontent. For example, the above mentioned search engines allow a user toenter various search criteria. The search engine probes the stored indexinformation generated by the webcrawler according to the searchcriteria. The search controller presents to the user any stored linkshaving corresponding index information that satisfies the entered searchcriteria. The user is able to view the actual page located on theoriginal website by following the link to the actual website.

SUMMARY

The present invention is directed to a method and system forsystematically tracking a defined set of network resources on a globalcomputing network. The method and system can be arranged todeterministically guarantee that any information from the sites isrelevant and current. The method and system also can be arranged toincrease the confidentiality of search parameters and the identities ofparties seeking information.

In one embodiment, the present invention provides a computer-implementedmethod for gathering information from network resources on a globalcomputer network, the method comprising assigning search times to thenetwork resources, the search times designating times at which thenetwork resources are to be searched within a monitoring period,categorizing the network resources into industry groups, generatingsearch items, each of the search items defining a search for particularinformation and designating one or more of the industry groups,identifying, at a given search time, the network resources that havebeen assigned the given search time and categorized into industry groupsdesignated by one or more of the search items, retrieving and storinginformation from the identified network resources, and performing thesearches defined by one or more of the search items on the storedinformation.

In another embodiment, the present invention provides a method forgathering information from network resources on a global computernetwork, the method comprising assigning search times to the networkresources, the search times designating times at which the networkresources are to be searched within a monitoring period, generatingsearch items, each of the search items defining a search for particularinformation and designating one or more of the network resources,identifying, at a given one of the search times, the network resourcesthat have been assigned the given search time and which are designatedby one or more of the search items, retrieving and storing informationfrom the identified network resources, whereby information from thenetwork resources that have not been assigned the given search time orare not designated by one or more of the search items is not retrievedand stored, and performing the searches defined by one or more of thesearch items on the stored information.

In a further embodiment, the present invention provides a method forgathering information from network resources on a global computernetwork, the method comprising generating a set of search items, each ofthe search items defining a search for particular information anddesignating one or more of the network resources, retrieving and storinginformation from the network resources designated by one or more of thesearch items, performing the searches defined by one or more of thesearch items on the stored information, and presenting results of thesearches.

In an added embodiment, the present invention provides a method forgathering information from network resources on a global computernetwork, the method comprising categorizing the network resources intoindustry groups, generating a set of search items, each of the searchitems defining a search for particular information and designating oneor more of the industry groups, retrieving and storing information fromthe network resources associated with the industry groups designated byone or more of the search items, performing the searches defined by thesearch items on the stored information, and presenting results of thesearches.

In another embodiment, the present invention provides a method forgathering information from network resources on a global computernetwork, the method comprising selecting a set of network resourcesresiding on the global computer network, assigning a search time to eachof the network resources, the search time indicating a time within amonitoring period in which the network resource is to be searched,generating a set of search items, each of the search items definingparameters for a search and designating one or more of the networkresources to be searched, determining, at approximately the search timefor each of the network resources, whether the respective networkresource is designated for searching by at least one of the searchitems, retrieving and storing information from the network resourcesdesignated by at least one of the search items, performing the searchesdefined by the search items on the stored information, and presentingresults of the searches to users.

In a further embodiment, the present invention provides a softwaresystem for monitoring network resources residing on a global computernetwork over a time interval, the system comprising a database storingresource identifiers that correspond to particular network resources,and search items that define a search for information and specify one ormore of the network resources, a system executive that constructs a setof the resource identifiers scheduled to be searched, and a set of thesearch items specifying at least one of the network resourcescorresponding to one of the resource identifiers of the constructedresource identifier set, a collection controller, for each of theresource identifiers of the constructed set of resource identifiers, thecollection controller retrieving information presented by the networkedresource corresponding to the resource identifier, a search controllerfor receiving the information retrieved by each of the collectioncontrollers, and a search instance, for each search item of the searchitem list, wherein the search controller instantiates each searchinstance to perform the search defined by the respective search item onthe information received from the collection controllers for the networkresource specified by the respective search item.

In an added embodiment, the present invention provides a method formonitoring information presented by at least one of a plurality ofnetworked computers comprising storing a plurality of identifiers,wherein each identifier corresponds to one of the plurality of networkedcomputers, storing a plurality of search items, wherein each search itemincludes search criteria and at least one networked computer to bemonitored, generating a set of identifiers to be searched, generating aset of search items monitoring at least one of the networked computerscorresponding to one of the identifiers of the identifier set,retrieving information presented by each of the networked computerscorresponding to an identifier of the identifier set, and searching theretrieved information according the search criteria of each search itemof the search item set monitoring the networked computer correspondingto the retrieved information.

Other advantages, features, and embodiments of the present inventionwill become apparent from the following detailed description and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of a software system for accessing andreporting network resources on global computer networks in accordancewith the present invention;

FIG. 2 is a flow chart illustrating a high-level operation of a systemexecutive in order to control the various software components of thesoftware system;

FIG. 3 is a flow chart illustrating one mode of operation in which thesystem executive controls the software system to access and search thenetwork resources that are due to be searched and currently targeted bya search item;

FIG. 4 is a flow chart illustrating one mode of operation of acollection controller responsible for traversing a single networkresource;

FIG. 5 is a flow chart illustrating one mode of operation of a webcrawler responsible for retrieving a single informative item andextracting any links to other informative items;

FIG. 6 is a flow chart illustrating one mode of operation of a searchcontroller responsible for managing the analysis of each informativeitem retrieved by the collection controllers;

FIG. 7 is a flow chart illustrating one mode of operation in which thesoftware system restarts the monitoring cycle and balances the retrievaland searching of the network resources across a plurality of computingdevices according to the actual number of pages previously retrievedfrom each network resource;

FIG. 8 is one example of a report generated by the software system forreporting matching information to a client;

FIG. 9 is block diagram of a computing system having a plurality ofcomputing devices suitable for executing the software system in adistributed manner; and

FIG. 10 is a block diagram of one embodiment of a global networkedenvironment in which a service center executes a software system inaccordance with the present invention.

DETAILED DESCRIPTION

In the following detailed description, references are made to theaccompanying drawings which illustrate specific embodiments in which theinvention may be practiced. Electrical, mechanical and programmaticchanges may be made to the embodiments without departing from the spiritand scope of the present invention. The following detailed descriptionis, therefore, not to be taken in a limiting sense and the scope of thepresent invention is defined by the appended claims and theirequivalents.

Conventional search systems are deficient in many ways. For example, dueto the vast information and myriad of sites residing on the Internet,conventional search systems produce excess, irrelevant information. Arather narrowly defined search on many of the conventional systems mayeasily produce thousands of references. Because the webcrawler traverseseach and every site that it finds on the Internet, valuable informationis often lost among thousands of references to irrelevant sites.Furthermore, conventional systems are, in a sense, non-deterministic.The matching links presented to the user by the search controller oftenno longer exist. Furthermore, the index information stored in therepository for a particular page is often incorrect and does not containrecently released information. In addition, conventional search enginesrequire huge resources to store the index information and links forsubsequent analysis.

Conventional search engines are also incredibly labor intensive. Inorder to search for specific information on the internet, a user isforced to access one or more publicly available search systems, enterits search criteria and manually parse the results. This process istedious and time consuming. For example, the user is forced toperiodically repeat the process in order to determine if any newinformation has been released. In order to identify any new information,however, the user is forced to parse through the previous informationalready examined.

Conventional search systems are also non-confidential. For example, inorder to reduce the numerous irrelevant references produced byconventional systems as described above, a user must narrowly define thesearch criteria. Often, the user is forced to provide a fairlycomprehensive description of the desired information before the numberof matches approaches a manageable number. This, however, is problematicin that it forces the user to divulge the idea being researched. Forthis reason, there is currently no feasible mechanism to search theInternet without divulging trade secrets or other intellectual property.The inability to confidentially retrieve information from the Internetmanifests itself in other areas besides the use of conventional searchengines. For example, many web sites provide a local search mechanism toassist in finding information within the web site. A user is able toaccess a web page and find all relevant information simply by engagingthe search mechanism. This, however, forces the user to describe thedesired information in detail and disclose the information to thewebsite. Thus, the user is unknowingly revealing the details regardingthe desired information. Furthermore, because the IP address of a useris readily available to the host site, not only is the informationrevealed, but the user is easily identified.

FIG. 1 is a block diagram of a software system 10 for confidentiallyaccessing and reporting information present on global computer networks,such as the Internet, in accordance with the present invention. Softwaresystem 10 includes system executive 20, one or more collectioncontrollers 30, one or more web crawlers 40, search controller 50, oneor more customer search instances 60, report generator 70, databasemanager 80 and user interface 90.

System executive 20 is responsible for overall control and management ofsoftware system 10. FIG. 2 is a flow chart illustrating one mode ofoperation of system executive 20. Upon initial execution of softwaresystem 10, system executive 20 starts execution in step 100, immediatelyproceeds to step 102 and instantiates database manager 80 for managingall accesses to a database (not shown). In one embodiment, databasemanager 80 has its own thread of execution. Preferably, database manager80 has a client/server interface whereby other components of softwaresystem 10 initiate a remote procedure call in order to access the dataof a database. In this manner, all accesses of database 100 aresynchronized and inherently thread safe. Upon instantiating databasemanager 80, system executive 20 commands database manager 80 to retrieveconfiguration data from a database. Typical configuration data includesa maximum number of concurrent collection controllers 30 that may beinstantiated concurrently, a maximum number of concurrent web crawlers40 and a maximum number of concurrent customer search items 60.

System executive 20 proceeds from step 102 to step 104 and waits for acontrol message. Control messages can be issued to system executive 20in two ways. First, user interface 90 presents a graphical interface bywhich an operator controls software system 10. After receiving inputfrom the operator, user interface 90 communicates a control message tosystem executive 20. Second, software system 10 includes an timer thread(not shown) that awakens at user-configurable times and sends controlmessages to system executive 20, thereby triggering automatic executionof software system 10. Referring again to FIG. 2, system executive 20receives control messages in step 104 and sequentially executes steps106 through 114 to determine the nature of the received control message.

If a StartTracking control message is received, system executive 20proceeds from step 106 to step 116 and analyzes information present onnetwork resources in accordance with the present invention. FIG. 3illustrates one mode of operation of system executive 20 for analyzingnetwork resources that are due to be tracked. In step 116, systemexecutive 20 proceeds to step 128 and retrieves information on the dailyresources that are due to be analyzed. More specifically, the databaseof software system 10 stores a plurality of resource identifiers, eachidentifier corresponding to a resource residing on the global computernetwork. In one embodiment, the database stores a plurality of domainsfor monitoring. Each domain identifies a website of a company,government body or other organization. Each resource identifier iscategorized into one of a plurality of industry groups. Each resourceincludes a search date that indicates when the resource is to besearched within the monitoring period. As discussed below, softwaresystem 10 deterministically monitors the resources over a configurableperiod such as one week, one month or even one year. In otherembodiments, the database may stores a plurality of domains thatidentify web-based databases, such as trademark, domain name, or tollfree telephone number databases, for monitoring of competitive activityor availability of such assets. Thus, the databases can be analyzed in asystematic fashion to maintain a “watch” for activity with respect tosuch assets.

In addition to a plurality of resource identifiers, the databasecontains a plurality of search items. Each search item includes generalinformation, such as a type which may be patent, trademark, etc., anabstract and search criteria. Furthermore, each search item designatesone or more network resources or industry groups to be monitored. Instep 130, system executive 20 instructs database manager 80 to retrieve:(1) a set of the stored search items, and (2) a set of pending networkresources that are due to be searched and are designated by at least oneof the search items. In this manner, software system 10 need not wastecomputing resources in order to analyze network resources that are notbeing tracked.

Upon receiving the daily tracking information from database manager 80in step 128 (FIG. 3), system executive 20 proceeds to step 130 andinstantiates a corresponding collection controller 30 for each pendingnetwork resource, subject to the user-configured maximum number ofconcurrently executing collection controllers 30. Each collectioncontroller 30 is responsible for analyzing the website of itscorresponding resource. In one embodiment, each collection controller 30has its own thread of execution and receives an address, known as thebase address, of the network resource to be analyzed. For example, thebase address may be “www.netshadow.com”.

After spawning the maximum number of collection controllers 30, systemexecutive 20 proceeds to step 132 and waits for one of the executingcollection controllers 30 to finish traversing the corresponding networkresource and retrieving its contents. When a collection controller 30signals completion, system executive 20 proceeds to step 134 andinstructs database manager 80 to update the schedule data for thenetwork resource traversed by the finished collection controller 30. Inthis manner, database manager 80 updates the database such that thetraversed network resource will not be traversed again until the nextmonitoring period. After updating the database, system executive 20proceeds to step 136 and determines whether there are more networkresources scheduled to be traversed and analyzed. If so, systemexecutive 20 jumps back to step 130 and spawns another collectioncontroller 30. If not, system executive 20 proceeds to step 138 anddetermines whether one or more collection controllers 30 are currentlytraversing network resources. If so, system executive 20 jumps back tostep 132 and waits for another collection controller 30 to finish. Whenall the collection controllers 30 have finished traversing the pendingnetwork resources, system executive 20 returns to step 104 of FIG. 2.

FIG. 4 is a flow-chart illustrating in detail one mode of operation ofan executing collection controller 30. Upon creation by system executive20, collection controller 30 begins execution in step 139 andimmediately proceeds to step 140. In step 140, collection controller 30creates a “pending link list” for holding links to be followed.Initially, collection controller 30 inserts the base address receivedfrom system executive 20. After initializing the pending link list,collection controller 30 proceeds to step 142 and instantiates a webcrawler 40 for each link stored in the pending link list, subject to theuser-configured maximum concurrent web crawlers. Each web crawler 40 isresponsible for retrieving the content of the informative item pointedto by its link. For example, the web crawler 40 may download and storean entire HTML page, a file published using Adobe Acrobat, a graphicfile, etc. As described in more detail below, each web crawler 40 alsoretrieves any links to other informative items the item contains.

When first executing step 142, collection controller 30 creates a singleweb crawler 40 for retrieving the item pointed to by the base address.In step 144, collection controller 30 waits for a web crawler 40 tofinish. When a web crawler 40 has finished retrieving the content of theinformative item pointed to by its link, collection controller 30proceeds to step 146 and receives any links the finished web crawler mayhave found. Collection controller 30 scans the pending link list andinserts any newly found links that: (1) are not already on the pendinglink list and (2) that have not already been followed. In step 148,collection controller 30 creates a token (data structure) that describesthe information retrieved by finished web crawler 40 and adds the tokento token queue 55. In step 148, collection controller 30 deletes theinstantiation of the finished web crawler 40, proceeds to step 150 anddetermines whether any links are pending. If so, collection controller30 returns to step 140 and spawns another web crawler 40. If no linksare pending, collection controller 30 proceeds to step 152 anddetermines whether any web crawlers 40 are currently executing. If so,collection controller returns to step 144 and waits for one of theexecuting web crawlers 40 to finish. If no web crawlers 40 are currentlyexecuting, collection controller proceeds from step 152 to step 154 andsignals system executive 20 that the network resource has successfullybeen traversed. After signaling system executive 20, collectioncontroller 30 proceeds to step 156 and terminates.

In one embodiment, collection controller 30 maintains and stores a listof successfully crawled links as it traverses the network resource. Thisembodiment is useful in the event that software system 10 terminatesbefore collection controller 30 is able to completely traverse thenetwork resource. In this case, the next time collection controller 30attempts to traverse the same network resource it loads the archivedlist of successfully crawled links. In this manner, collectioncontroller 30 continues to traverse the network resource withoutretrieving previously retrieved informative items.

In yet another embodiment, collection controller 30 waits a configureddelay time before spawning each web crawler 40. In this manner,collection controller 30 ensures a reasonable loading on the networkresource being traversed. This aspect is also advantageous in giving theappearance of manually traversing the network resource. For example, inanother embodiment, collection controller 30 waits a random delay time,within a range of possible delay time, between the spawning of webcrawlers 40, thereby giving the appearance of manually traversing anetwork resource.

FIG. 5 is a flow-chart illustrating one mode of operation of web crawler40. When web crawler 40 is instantiated by collection controller 30, itreceives a link to an informative item such as an HTML page, a graphic,an Acrobat file, etc. Web crawler 40 begins execution at step 160,immediately proceeds to step 162 and opens an HTTP connection with thenetwork resource pointed to by the link. Once an HTTP connection isestablished, web crawler 40 proceeds to step 164 and creates a localfile to hold the retrieved informative item. In step 166, web crawler 40downloads the informative item into the local file. After downloadingthe item, web crawler 40 proceeds to step 168 and scans the local filefor any links to other items. Upon scanning the file, web crawler 40proceeds to step 170 and signals collection controller 20. Aftercommunicating the name of the local file and any newly found links tocollection controller 20, web crawler 40 proceeds to step 172 andterminates.

Search controller 50 receives tokens from collection controllers 30 viatoken queue 55 and is responsible for determining whether a retrieveditem satisfies the search criteria of one or more of the search itemsstored in the database. Each token includes a filename of a local fileholding an informative item for searching as well as a type fieldindicating the file type.

FIG. 6 is a flow-chart illustrating one mode of operation of searchcontroller 50. Upon creation by system executive 20, search controller50 begins execution in step 180 and proceeds to step 181 where itreceives a set of search items from system executive 20. Next, searchcontroller 50 proceeds to step 182 and waits for tokens to be placed inthe token queue 55 by collection controller 30. When a token isreceived, search controller 50 proceeds to step 184. In step 184, searchcontroller 50 retrieves the filename and file type from the token, opensthe local file indicated by the filename and generates a hash table anda checksum based on the content of the local file.

After generating the hash table and the checksum, search controller 50proceeds to step 185 and queries database managers 80 to determinewhether an informative item having the same link address and checksumhas already matched a search. If so, search controller 50 jumps to step200, deletes the token, returns to step 182 and waits for the nexttoken. In this fashion, search controller 50 conserves computingresources by not searching documents or files that have already matchedsearch criteria and have remained unchanged.

If the test in step 185 fails, search controller 50 advances to step 186and instantiates a search instance 60 for each search item received fromsystem executive 20, subject to the user-configured maximum concurrentsearch instances 60. Each search instance 60 is responsible for testingthe hash table with the search criteria of the corresponding searchitem. For example, each search item has one or more search stringssimilar to the following:(semicond!*wafer)+(fabric!*chip!)+(memory w/2 module)where ‘*’ signifies boolean AND, ‘+’ signifies boolean OR, ! is anexpansion operator and ‘w/x’ means within X words.

After spawning a maximum number of search instances in step 186, searchcontroller 50 proceeds to step 188 and waits for a search instance 60 tofinish. When a search instance 60 has finished testing the hash tablewith the search criteria, search controller 50 proceeds to step 190 andqueries the finished search instance 60 whether the hash table satisfiedthe search criteria. If a match did not occur, search controller 50jumps ahead to step 194. If a match occurred, search controller 50 movesthe temporary local file to a more permanent location and stores the newlocations, the link address of the original informative item and thechecksum in the database.

In step 194, search controller 50 deletes the instantiation of thefinished search instance 60, proceeds to step 196 and determines whetherany search items still remain for testing against the hash table. If so,search controller 50 returns to step 186 and spawns another searchinstance 60. If no search items remain, search controller 50 proceeds tostep 198 and determines whether any search instances 60 are stillexamining the hash table. If so, search controller 50 returns to step188 and waits for one of the executing search instances 60 to finish. Ifno search instances 60 are currently executing, search controller 50proceeds from step 198 to step 200 and deletes the token that was poppedfrom the token queue and the corresponding temporary file containing theinformative item. Thus, unlike conventional search items that storeretrieved information to be used to satisfy future searches, softwaresystem 10 deletes all information that does not match current searchcriteria. In this manner, software system 10 conserves system resourcesand deterministically guarantees that each search item is tested withcurrent information.

After deleting the token, search controller 50 proceeds to step 182 andwaits for the next token. In this manner, software system 10deterministically monitors a plurality of network resources over aconfigurable period. In addition, software system 10 conserves resourcesby not searching pages that have already satisfied search criteria andhave not been changed.

Referring again to FIG. 2, if a RestartSearchCycle control message isreceived, system executive 20 proceeds from step 108 to step 118 andrestarts the monitoring period by invoking a sophisticated loadbalancing technique. FIG. 7 is a flow-chart illustrating in detail onemode of operation of software system 10 for restarting the monitoringperiod in step 118. In step 200, system executive 20 sets local variableD equal to the total days of the monitoring period as configured by theoperator and stored in the database. This allows the operator tocompletely control the period in which the set of network resources arecompletely monitored. Next, system executive 20 sets a local variable CDequal to the current date. System executive 20 instructs databasemanager 80 to set the starting date of the current search cycle to thecurrent date. Next, system executive 20 commands database manager 80 toset the ending date of the search cycle to the current date plus thenumber of days in the monitoring period.

After setting the start and ending dates in the database, systemexecutive 20 proceeds to step 202. As discussed in detail below,software system 10 may be distributed over a number of computers. Instep 202, system executive 20 queries database manager 80 for a list ofall of the computers in the distributed system that traverse networkresources by executing collection controllers 30. Based on this list,system executive 20 set a local variable (TC) to a total number ofcomputers in the distributed system that operate as such. Next, systemexecutive 20 instructs database manager 80 to access each networkresource identifier stored in the database and retrieve a number ofknown pages (RKP) for each resource. This value is set whenever acollection controller 30 successfully traverses an entire networkresource and indicates the total number of pages retrieved from theresource. As described in detail below, system executive 20 balances thetracked network resources across the number of computers in thedistributed system according to the previous number of pages retrievedfor the network resources, thereby more accurately load balancing thesystem. As database manager 80 access each network resource identifierstored in the database, a running total of the number of pages (TP) ismaintained.

System executive 20 proceeds from step 202 to step 204 and calculates anaverage daily pages (ADP) by dividing the total pages by the days in thecurrent monitoring period. System executive 20 further calculates anaverage pages per computer (APC) by dividing the average daily pages bythe total number of computers in the distributed system. This value,APC, reflects the average number of pages (informative items) eachcomputer should retrieve per day for the system to be optimallybalanced. In step 206, system executive 20 clears a local variablecurrent computer pages (CCP) and sets another variable, current computer(CCR), to the first computer in the list of computers that executecollection controllers 30. After initializing these variables, systemexecutive 20 proceeds to step 208 and begins the load balancing process.

In step 208, system executive 20 commands database manager 80 to onceagain access each network resource identifier stored in the database.For each network resource identifier, system executive 20 repeats steps210, 212 and 214. In step 210, system executive 20 commands databasemanager 80 to set the network resource identifier's next search date tothe date stored in the local variable CD. Initially, this value will bethe current date. In addition, system executive 20 commands databasemanager 80 to set the identifier's search computer to the computerstored in the local variable CCR. System executive 20 adds the number ofknown pages (RKP) for each resource to the variable CCP, thereby keepingtrack of the total number of pages assigned to the current computer.

System executive 20 proceeds from step 210 to step 212 and checkswhether the number of pages assigned the current computer has exceededthe average (APC) as calculated above. If not, system executive 20 jumpsback to step 208 and continues through the network resource identifiers.If the number of pages assigned the current computer has exceeded theaverage, system executive 20 proceeds from step 212 to step 214 and setsthe local variable CCR to the next computer in the list received fromdatabase manager 80. If the list has been exhausted, CCR is set to thefirst computer in the list. Next, system executive 20 resets thevariable CCP and jumps back to step 208. When all of the networkresource entries in the database have been updated, system executive 20jumps from step 208 to step 104 (FIG. 2) and waits for another controlmessage. In this manner, system executive 20 sets the next search dateand search computer for each network resource. Furthermore, the networkresources are evenly balanced throughout the monitoring period andacross the computers of the distributed system. This balancing isimproved by using stored information on the last number of pagespreviously retrieved from each network resource. Furthermore, the searchcycle can be restarted manually by the operator or by the alarm threadwhen the current search cycle has completed. In this manner, softwaresystem 10 balances the tracking of the network resources upon thecompletion of each monitoring period.

Referring again to FIG. 2, if a GenerateReports control message isreceived, system executive 20 proceeds from step 110 to step 120 andcommands report generator 110 (FIG. 1) to generate client reports. Tocreate a client report, report generator 110 instructs database manager80 to retrieve all of the link addresses and permanent file locationsrecently stored by search controller 50 for informative items thatsatisfied one or more of the client's search items. The reports can begenerated in a variety of forms.

In one embodiment, report generator 110 constructs a hierarchy of HTMLfiles that comprise the client's report and may be viewed by aconventional browser. A main HTML file contains a list of each searchitem for the client. When one of the search items is selected, thebrowser displays a second HTML file that more fully describes the searchitem and its corresponding search criteria. In addition, the second HTMLfile includes a list of each informative item that satisfied theselected search item's criteria. When one of the informative items isselected, the browser displays the selected informative item with anytext that satisfied the search criteria highlighted. In this embodiment,the hierarchy of HTML files includes an HTML file for each informativeitem. In order to communicate the report to the client, the entirehierarchy of files is placed on a diskette, or other suitable media suchas a CDROM, and mailed to the corresponding client. Alternatively, thefiles may be communicated via electronic mail to the client. Preferably,the electronic communication is encrypted to maximize confidentiality.

In another embodiment, report generator 110 constructs an HTML file foreach search item. The HTML file fully describes the search item and itscorresponding search criteria. In addition, the HTML file includes alist of informative items that satisfied the selected search item'scriteria. Unlike the embodiment described above, in this embodiment, aclient report does not actually include the informative items. The HTMLfile is constructed such that when one of the informative items isselected, the browser follows the link address to the actual networkresource containing the item, retrieves the item and displays the item.As in the previous embodiment, each HTML file may be placed on adiskette or electronically mailed to the client.

In yet another embodiment, the report generator 110 retrieves the baseaddress for each network resource that satisfied one or more of aclient's search items. Unlike the previous embodiments, report generator110 does not construct a report based on the network resource's matchinginformative items but traverses the entire network resource in order toconstruct a hierarchy of HTML files that form a comprehensive siteindex. More specifically, report generator 110 formulates a list ofevery word disclosed by the informative items of the network resource.Based on this list, report generator 110 constructs the index thatprovides a link to each usage. FIG. 8 illustrates one portion of asample index. When a particular usage is selected, the browser displaysthe informative item with the usage highlighted.

Referring again to FIG. 2, if a Shutdown control message is received,system executive 20 proceeds from step 114 to step 124 and deletesdatabase manager 80, search controller 50, token queue 55 and reportmanager 110. After successful deletion of the various components, systemexecutive 20 and software system 10 terminate.

The present invention described above is suitable for executing on asingle computer having a storage device and network interface such as anetwork card, an ISDN terminal adapter or a high-speed modem. Thepresent invention, however, may readily be distributed across a systemhaving multiple computers in order to efficiently monitor large numbersof network resources.

FIG. 9 is a block diagram of a distributed computing system 300 forexecuting software system 10 (FIG. 1) to confidentially accessinformation present on global computer networks, such as the Internet,in accordance with the present invention. Computing system 300 comprisesa plurality of computing devices, including collection nodes 310, searchnodes 320, database server 330 and user interface device 340, that arecommunicatively coupled via network 345. As explained in detail below,each of these computing devices executes one copy of software system 10.Upon execution on each computing device, system executive 20 of softwaresystem 10 determines the type of computing device and operatesaccordingly.

First, system executive 20 determines whether the particular computingdevice is database server 330. If so, system executive 20 instantiatesdatabase manager 80 as a server that directly controls access to thedatabase. If not, system executive 20 instantiates database manager 80as a client that handles access requests via making a remote procedurecall (RPC) to the database manager 80 of database server 330. Inaddition, system executive 20 determines whether the particularcomputing device is a collection node 310, a search node 320 or a userinterface device 330.

Next, for collection nodes 310, system executive 20 instantiates tokenqueue 55 as an RPC client. For search nodes 320, system executive 20instantiates token queue 55 as a server that receives tokens overnetwork 345 via RPC calls. Each system executive 20 of collection nodes310 spawns one or more collection controllers 30 in order to traversethe network resources that are due and are assigned to the correspondingcollection node 310. Collection nodes 310 access Internet 360 via router350. The retrieved informative items are passed to the token queueclient which communicates pertinent information, such as the linkaddress and local file location, to a token queue server of one of thesearch nodes 320. Each system executive 20 of search nodes 320 spawnssearch controller 50 to accept tokens from token queue 55 and search anyreceived token as illustrated in FIG. 6 described above. In this manner,the informative item retrieved by collection nodes 310 are distributedevenly to search nodes 320, thereby allowing efficient monitoring ofvast numbers of network resources.

In one embodiment, network 345 of computing system 300 allows remoteaccess via authorized clients. For example, in one embodiment, userinterface device 320 executes Windows NT and handles remote clientsusing Remote Access Server (RAS). In another embodiment, network 345supports a virtual private dial network. In this embodiment clients areable to view their corresponding search items, and recently retrievedinformative items that matched their search criteria, withoutcommunicating confidential information over Internet 300. Thus, unlikeconventional search engines, the present invention allows clients toautomatically monitor a plurality of network resources of a configuredmonitoring period without ever communicating the confidential searchcriteria over an insecure network.

In order to allow an operator to control and configure distributedcomputing system 300, system executive 20 instantiates user interface 90upon determining that the computing device is user interface device 340.For example, when various computing devices are added or removed fromcomputing system 300, user interface 90 allows an operator to update thedatabase via database server 330.

FIG. 10 is a block diagram of one embodiment of a global networkedenvironment 400 in which service center 405 executes software system 10(FIG. 1) in accordance with the present invention. In one embodiment,service center 405 includes distributed computing system 300 (FIG. 9)and executes software system 10 as described above. In addition toconfidentially monitoring information as described above, service center405 integrates advertising and processes electronic orders for patents,file wrappers and technical disclosures (described below). Individualusers 415 communicate with service center 405 over a global computernetwork, such as the Internet, in order to view secure accounts thatcontain their corresponding search items and any informative items foundby service center 405 that satisfy the search criteria. In oneembodiment, all communications between users 415 are encrypted anddigitally signed and authenticated, thereby ensuring confidentiality.

In one aspect, service center 405 is configured to communicate withintellectual property (IP) management software 410 executing withinorganization 420 which may be any entity such as a corporation, legalfirm, etc. In one embodiment, all communications between service center415 and organization 420 are encrypted and digitally signed andauthenticated, thereby ensuring confidentiality. IP management software410 is any software suitable for presenting information and statusregarding the intellectual property of organization 420. For example, IPmanagement software 410 integrates docketing information, guidelines,templates and existing confidential disclosure agreements.

One beneficial feature of the present invention is that as organization420 gains new intellectual property, information is automatically (andconfidentially) communicated from IP management software 410 to anaccount within service center 405. The information regarding the newintellectual property is received by service center 405 and added, as asearch item with appropriate search criteria, to the account oforganization 420. Once received, service center 405 begins monitoringglobal computer networks for any information regarding the newintellectual property. Thus, the present invention eliminates the needfor organization 420 to manually upload information regarding newintellectual property, such as patents and trademarks. Service center405 sends an alert, such as an email, to organization 420 and users 415when relevant informative items are added to their accounts.

From time to time inventors use technical disclosure services to publishinformation they want in the public domain but have decided not topursue via patent or product. This service, however, is quite expensiveand may cost up to $300 per page. Conventional services publish thedisclosures anonymously in many countries. Such a service is basically adefensive measure by which the inventors prevent others from patentingthe idea.

The present invention contemplates technical publication service thatanonymously publishes information on global computer networks. Morespecifically, users log into a website and submit technical disclosures.Preferably the disclosures are in text, Acrobat (pdf), Microsoft Word orany other commonly used format. When a user submits a disclosure, he orshe also submits an abstract and perhaps identifies key terms that bestdescribe the disclosure. According to the present invention, afterreceipt of the disclosure the network service automatically:

-   1. Adds the submitted electronic disclosure to a collection of other    store publications. In one embodiment the collection of electronic    publications is maintained in a jukebox of recordable CDs.-   2. Updates a publicly available database, thereby making the    publication immediately available to anyone who can access the    global computer network.-   3. Transmits the stored location of the received electronic    disclosure, as well as the abstract and key terms, to a plurality of    major search engines, thereby making the new disclosure immediately    accessible and locatable.-   4. Accesses one of the search engines and exercises the engine to    look for any documents that satisfy the key terms of the received    disclosure. While accessing the search engine, the service records    the results for future proof of publication.-   5. Communicates the results to the user via email or paper so that    the user can offer the results as evidence that the disclosure was    indeed published and available to the public.-   6. Maintains the received publication in the database for a fixed    period of time, thereby allowing the public to retrieve and view the    document. Various embodiments of a method and system for    confidentially accessing and reporting information present on global    computer networks have been described. This application is intended    to cover any adaptations or variations of the present invention. It    is manifestly intended that this invention be limited only by the    claims and equivalents thereof.

1-14. (canceled)
 15. A system comprising: a first computing systemconfigured to execute a search tool that collects content from networkresources, searches the collected content according to a set of searchitems, and stores the content upon occurrence of a match between asearch item and the content; and a second computing system configured toexecute intellectual property management software that presentsinformation regarding intellectual property of an organization, theintellectual property management software being configured tocommunicate occurrence of the organization acquiring a new intellectualproperty asset to the first computing system; wherein the search tool isfurther configured to create at least one search item based at leastpartially upon the communication from the second computing system. 16.The system of claim 15, wherein the first computer system comprises aplurality of computing devices configured to cooperate with one anotherto implement the search tool.
 17. The system of claim 15, whereincommunication between the first and second computing systems isencrypted.
 18. The system of claim 15, wherein the intellectual propertymanagement software is further configured to docket dates related tointellectual property assets.
 19. The system of claim 15, wherein theintellectual property management software is further configured to storeand retrieve agreements related to intellectual property of theorganization.
 20. The system of claim 19, wherein the agreements includeconfidential disclosure agreements.
 21. The system of claim 15, whereinthe search tool generates a report for the organization, based at leastin part upon the collected content that matches the search items createdfor the organization.
 22. The system of claim 21, wherein the report iscommunicated to the organization via electronic communication.
 23. Thesystem of claim 22, wherein the electronic communication compriseselectronic mail.
 24. The system of claim 21, wherein the report includeshyperlinks pointing to network documents containing the content thatmatches the search items.
 25. The system of claim 21, wherein the reportincludes the content that matches the search items.
 26. A methodcomprising: collecting content from network resources; searching thecollected content according to a set of search items; storing thecontent upon occurrence of a match between a search item and thecontent; receiving a communication indicating the occurrence of anorganization acquiring a new intellectual property asset; and creatingat least one search item, based at least partially upon thecommunication.
 27. The method of claim 26, wherein the communication isencrypted.
 28. The method of claim 26, further comprising generating areport for the organization, based at least upon the collected contentthat matches the at least one search item created for the organization.29. The method of claim 28, further comprising communicating the reportto the organization.
 30. The method of claim 28, wherein the reportincludes hyperlinks pointing to network documents containing the contentthat matches the search items.
 31. The method of claim 28, wherein thereport includes the content that matches the search items.
 32. A methodcomprising: presenting information regarding intellectual property of anorganization; determining that the organization has acquired a newintellectual property asset; communicating the determination to a remotecomputing system that runs a search tool; and receiving a search reportfrom the remote computing system, the search report containinginformation concerning the new intellectual property asset of theorganization.
 33. The method of claim 32, wherein the communication tothe remote computing system is encrypted.
 34. The method of claim 32,further comprising docketing dates related to the intellectual propertyof the organization.
 35. The method of claim 32, further comprisingstoring and retrieving agreements related to the intellectual propertyof the organization.
 36. The method of claim 35, wherein the agreementsinclude confidential disclosure agreements.